A Glimpse of Firearms in the U.S.

*Table of Contents*

Introduction

Possession, purchase or sale of firearms and ammunition has been quite common in the United States for centuries. However, increasing acts of violence and even deaths associated with guns also deserve to be mentioned, especially after the pandemic of COVID-19. Therefore, with the help of **FBI gun data**, an analysis could be carried out to figure out the geographics, growth and other trends for the firearms in the U.S.

The gun data is found in FBI's national instant criminal background check system, or [NICS](https://www.fbi.gov/services/cjis/nics), while the census data comes from [U.S. Census Bureau](https://www.census.gov/).

Data Cleaning

After data assessment, it is quite clear that the two datasets are hard to clean and filter, because of:

Very large scale: gun data has more than 12,000 rows, and census data owns around 80 attributes.
Missing data: lots of NaNs are found in both datasets, many letters are used for special numerics in census data.
Incorrect data types: all datatypes for census data are objects, while most of them shall be ints or floats.
Transpose needed: states are used for columns in census data, while appear in rows for gun data.

**Therefore, careful investigation and selection must be carried out prior to the data cleaning process, in order to decrease workload.*

*Part 1: census data cleaning*


  • The CSV file consists of 85 rows of attributes and 52 columns of states, which needs to be transposed, and many missing values shall be deleted.
  • There are lots of messy datatypes, we could even see 0.17, 20%, 'Z' in the same column that are all strings. It's very challenging to clean them.
  • The challenging part is turning all data into a unified numerical format.

    The data types and formats can be quite different, not just among columns but also among rows.

    First of all, the missing values shall be dropped, including the supplementary notes. Based on those notes, we could locate special letters inside the dataset and deal with them. After that, further cleaning processes could be carried out.

  • The dataframe needs to be transposed, in order to turn states into index and facts into columns.
  • Names of 'Fact' are quite long, which needs to be replaced by shorter names.
  • Finally, after various steps, a cleaned dataframe was created as census_df.

  • *Part 2: gun data cleaning*


  • The CSV file consists of 2 columns of time and state, around 25 columns of different gun data, and more than 12,000 rows.
  • Most columns have lots of missing data, except the month and state column, as well as two gun data columns, one of them is called 'totals'.
  • The column 'totals' is nothing but a beautiful trap.

    This column doesn't have any missing values and sounds like total gun sales, which looks really good at first glance, however...

    By calculation we found 'total' is the sum of all numeric values in a row, including new and regained gun permits, normal gun sales, and even returns as well as redemptions, which make the value 'total' doesn't make much sense at all: it cannot represent anything.

  • Fortunately, we could found good alternatives by data cleaning, which could represent annual permit and gun accruals.
  • The new dataframe is ranged from 2010 to 2016 on annual basis, which could be easily merged with census data.
  • Finally, a cleaned dataframe was created as gun_df, with 4 key columns: year, state, permit_growth and gun_growth.
  • Data Analysis

    Some questions need to be answered:

    Which characters are closely related to gun sales in U.S.?
    With positive or negative correlations?
    Is there any seasonality for gun sales or registration?
    Can we find some geographic features for the gun data?
    Any general trends for gun sales or registration during past decades?

    Many new data frames are created, to perform correlation tests between attributes.

    *Part 1: Correlations*

    *Table of Contents*

    Population, Gender & Race, Poverty, Immigration, Education

    Population Growth, Population Density and Gun Growth

  • Based on the data during 2010-2016, population growth has strong positive correlation to gun growth.
  • However, population growth doesn't have much correlation to permit growth, probably due to policy issues.

  • We could use gun number growth per capita for selected years to reflect gun density, since total gun number cannot be calculated by the data given.
  • In states with lower population density, gun density is higher, perhaps due to hunting needs.
  • Gender, Race and Gun Growth

  • The result is quite obvious that higher female% in a state usually correlate to lower gun density. It sounds valid since men have stereotype of loving guns.

  • The major races in U.S. are: black, hispanic and white. For native american and asian, data are missing in some states due to much less population.
  • White% in a state has an obvious positive correlation to gun density, probably due to the hunting tradition for white farmers.
  • However, Black% and Latino% both have weak negative correlation to gun density, which is contradict to hate speeches towards those minorities.
  • Poverty, Financial Status and Gun Growth

  • Financial situation can be represented by income level, poverty%, unemployment% and uninsured%..
  • Higher income and lower rate of poverty, unemployment and uninsured could lead to better financial situation, which is correlated to lower gun density.

  • In addition, burden on housing costs, including mortgage or rent as percentage of income, can also reflect different financial situations.
  • However, higher housing cost is correlated to lower gun density, might due to lower necessity to own guns in metropolitan.
  • Foreign Immigration and Gun Growth

  • Foreign language speaking population percentage could effectively reflect first-generation immigrants density in certain state.
  • Here we could find states with more immigrants tend to have lower gun density.
  • Probably because new immigrants prefer to live in states with higher population density such as CA and NY, and most of them don't have a gun culture.